Replication Data for: "Auditing the Hallucination Gap in Large Language Models for Utility Infrastructure Management"

## Contents
- `generate_synthetic_data.py`: Python script to generate the synthetic datasets.
- `synthetic_water_asset_data.csv`: Synthetic asset inventory data.
- `synthetic_break_records.csv`: Synthetic water main break records.

## Purpose
The original data from the City of Sugar Land contains sensitive information related to public infrastructure security and cannot be shared. These synthetic datasets replicate the structure and key statistical relationships of the original data to allow for full methodological reproducibility.

## Instructions
1. Run the Python script `generate_synthetic_data.py`. This will generate the two CSV files.
2. The random seed is set to 42, ensuring the same data is generated every time.
3. The synthetic data can then be used as input for the analysis scripts provided in the main study.

## Notes
This data is entirely artificial and cannot be traced to any real-world utility system. It is provided solely for the purpose of validating the computational methods used in the associated paper.